专利摘要:
REGULATION OF ISSUE OF PROCESSOR INSTRUCTIONS. The invention relates to a system and method for reducing power consumption by regulating the emission of selected problematic instructions. A power regulation unit within a processor maintains instruction issue counts for associated instruction types. Instruction types can be a subset of supported instruction types executed by an execution core within the processor. Instruction types can be chosen based on high power consumption estimates to process instructions for these types. The power regulation unit can determine that a given instruction emission count exceeds a given limit. In response, the power regulation unit can select data types of instructions to limit the respective emission rate. The power regulation unit can choose an emission rate for each of the selected instruction types and limit an associated emission rate to a chosen emission rate. The selection of data types of instruction and associated emission rate limits is programmable.
公开号:BR102012024721B1
申请号:R102012024721-6
申请日:2012-09-27
公开日:2021-01-05
发明作者:Daniel C. Murray;Andrew J. Beaumont-Smith;John H. Mylius;Peter J. Bannon;Toshi Takayanagi;Jung Wook Cho
申请人:Apple Inc.;
IPC主号:
专利说明:

BACKGROUND OF THE INVENTION FIELD OF THE INVENTION
[001] The present invention relates to computing systems, and more specifically, to efficiently reduce power consumption by regulating the emission of selected problematic instructions. DESCRIPTION OF THE RELEVANT TECHNIQUE
[002] The geometric dimensions of devices and metallic routes in each generation of semiconductor processor cores are decreasing. Therefore, more functionality is provided within a given matrix real estate area. As a result, mobile devices, such as laptop computers, tablet computers, smartphones, video cameras, and the like, have an increasing popularity. Typically, these mobile devices receive electricity from one or more battery cells. Since the batteries have a limited capacity, they are periodically connected to an external charger to be recharged. A vital problem for these mobile devices is power consumption. As the power consumption increases, the battery life for these devices is reduced and the charging frequency increases.
[003] As the density of integrated circuits over a matrix increases with multiple threads, larger caches, and more complex logic, the number of nodes and buses that can switch per clock cycle increases significantly. Therefore, the power consumption increases. In addition, a software application can execute specific program code that causes the hardware to achieve a high power dissipation value. Such code does this either intentionally or intentionally (for example, a potency virus). The power dissipation can increase due to multiple occurrences of data types of instruction within the program code. This power dissipation value can reach or exceed the thermal design power (TDP) of the chip or even the maximum chip power dissipation.
[004] In addition to the above, the cooling system of a mobile device can be designed for a given thermal design power (TDP), or a thermal design point. The cooling system may be able to dissipate a TDP value without exceeding a maximum junction temperature for the semiconductor matrix. However, multiple occurrences of instruction type data can cause the power dissipation to exceed the TDP for the semiconductor chip. In addition, there are current limits for the power supply that can be exceeded as well. If the power modes do not change the operating mode of the chip or turn off specific blocks within the chip, then the battery can be quickly discharged. In addition, physical damage can occur. Although a proposal to manage peak power dissipation may simply be to limit the issuance of instructions so that it does not exceed a specific limit, this can result in an unacceptable reduction in total performance.
[005] In view of the above, methods in efficient mechanisms to reduce power consumption by regulating the emission of selected instructions are desired. SUMMARY OF MODALITIES OF THE INVENTION
[006] Systems and methods for reducing power consumption by regulating the emission of selected instructions are contemplated.
[007] In one embodiment, a processor includes a power regulation unit. The power regulation unit can be used within the same dazzle stage as a scheduler. The power regulation unit maintains one or more instruction issue counts for one or more types of instruction. Instruction types can be a subset of supported instruction types executed by an execution core within the processor. Instruction types can be chosen based on high power consumption estimates to process instructions of these types. For example, it has a floating point single instruction (SIMD) multi-data instruction type (FP) that can have wide data tracks to process vector elements during a multi-cycle latency. While maintaining the instruction emission count, the power regulation unit may determine that a given instruction emission count exceeds a given limit. In response, the power regulation unit can select one or more types of instructions to limit the respective emission rate.
[008] The selection of one or more types of instructions can be based on an estimate of power state. Alternatively, this selection can be based on a user's changes made through software. The power regulation unit can choose an emission rate for each of the selected one or more types of candidate instructions. This choice of emission rate can also be based on an estimate of power status or software updates of specific control records. The power regulation unit can limit an associated emission rate for each of the selected one or more types of instructions to a respective chosen emission rate. Therefore, the emission rate limit may change, or is programmable.
[009] These and other modalities will be additionally appreciated when referring to the following description and drawings. BRIEF DESCRIPTION OF THE DRAWINGS
[0010] Figure 1 is a generalized block diagram of a modality of a processor core that performs an execution out of order
[0011] Figure 2 is a generalized block diagram of a power management modality for a semiconductor chip.
[0012] Figure 3 is a generalized block diagram that illustrates a modality of a power regulation unit.
[0013] Figure 4 is a generalized flowchart that illustrates a modality of a method to control an instruction issuance rate for specific types of instruction.
[0014] Figure 5 is a generalized block diagram that illustrates a modality of a regulation table.
[0015] Figure 6 is a generalized block diagram that illustrates a way of limiting an instruction issuance rate.
[0016] Figure 7 is a generalized block diagram that illustrates another way of limiting an instruction issuance rate.
[0017] Figure 8 is a generalized block diagram that illustrates yet another way of limiting an instruction issuance rate.
[0018] Figure 9 is a generalized flow chart of a modality of a method for controlling an instruction issuance rate of specific instruction types.
[0019] Although the invention is susceptible to several modifications and alternative forms, its specific modalities are shown as an example in the drawings and will be described in detail here. It must be understood, however, that the drawings and their detailed description are not intended to limit the invention to the specific form described, but on the contrary, the intention is to cover all modifications, equivalents and alternatives that fall within the scope. spirit and scope of the present invention as defined by the embodiments. As used throughout this request, the word "can" is used in a permissive sense (ie, meaning having the potential for), rather than the mandatory sense (ie, meaning must). Similarly, the words "include", "including", and "includes" mean including, but not limited to.
[0020] Various units, circuits, or other components can be described as "configured to" perform a task or tasks. In such contexts, "configured for" is a broad recitation of a structure that usually means "having a circuit that" performs the task or tasks during the operation. As such, the unit / circuit / component can be configured to perform the task even when the unit / circuit / component is not currently connected. In general, the circuit that forms the structure that corresponds to "configured for" may include hardware circuits. Similarly, several units / circuits / components can be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase "configured for". Reciting a unit / circuit / component that is configured to perform one or more tasks, it is expressly intended not to invoke the interpretation of 35 U.S.C. § 112, paragraph six for that unit / circuit / component. DETAILED DESCRIPTION
[0021] In the following description, numerous specific details are presented to provide a complete understanding of the present invention. However, someone skilled in the art must recognize that the invention could be practiced without these specific details. In some cases, well-known circuits, structures, and techniques have not been shown in detail to avoid obscuring the present invention.
[0022] Referring to figure 1, a generalized block diagram illustrating an embodiment of a processor core 100 that performs an out-of-order execution is shown. Processor 100 can use a multistage thread for processing instructions. An instruction cache (i-cache) 104 can store instructions for a software application. One or more instructions indicated by a carrier address by the address selection logic 102 can be extracted from i-cache 104. Multiple instructions can be extracted from i-cache 104 per clock cycle if there are no i-cache faults . The address can be incremented by the next extraction predictor 106. A branch direction predictor 108 can be coupled to each of the next extraction predictor 106 and the control flow evaluation logic 112 at a later chaining stage . Predictor 106 can predict instruction information that changes the flow of a flow of instructions from the execution of a next sequential instruction.
[0023] The decoding unit 110 decodes the operation codes of the multiple instructions extracted. Alternatively, the instructions can be divided into microinstructions, or micro-ops. As used herein, the terms "instructions" and "micro-ops" are interchangeable since the invention can be used with an architecture that uses both implementations. The decoding unit 110 can allocate the inputs in a dispatch queue 114. In one mode, the control flow evaluation block 112 can change the instruction extraction in the address selector 102. For example, a value absolute address associated with an unconditional branch operation code can be sent to address selector 102. Instructions in dispatch queue 114 have associated operands and destination identifiers renamed by renaming network 118. Renaming network 118 can receive candidate names from a 120 free list allocator.
[0024] A dependency blend block 122 can generate dependency vectors for instructions received. Named identifiers selected in the previous chaining stage can be used to find and indicate the dependencies between the instructions. The dependency blend block 122 can provide the instructions and associated renamed identifiers, program counter (PC) values, dependency vectors, and so on for scheduler 124.
[0025] Scheduler 124 can schedule instructions for execution at execution core 130. When operands are available and hardware resources are also available, an instruction can be issued out of order from scheduler 124 to one of the functional units within the execution core 130. Scheduler 124 can read its source operands from an architectural log file (not shown) or an operand bypass logic. The source operands can be provided for the execution core 130.
[0026] Execution kernel 130 can detect various events during the execution of instructions that can be reported to scheduler 124. Two examples include the wrongly predicted branching instructions and repeated load / store instructions. Several exceptions can be detected. Two examples include protection exceptions for memory accesses or for privileged instructions being executed in a non-privileged mode and exceptions for non-address translation. Exceptions can cause a corresponding exception handling routine to be performed, such as by microcode 116.
[0027] The execution core may include a load / store unit 140. The load / store unit 140 may include temporary storage to store addresses that correspond to storage instructions, a storage queue for storing the storage data that corresponds to the storage instructions and temporary storage for loading to store both address and data for the loading instructions. The load / store unit 140 may be connected to a data cache (not shown). Processor 100 may include a translational lookaside (TLB) buffer for each of the i-cache 104 and the data cache to avoid a cost of performing a full memory translation when performing a cache access.
[0028] The execution core 130 can include several computing units that perform at least addition, subtraction, displacement, bitwise logical operations, rotation, and / or other functions. In the example shown, execution core 130 includes an integer arithmetic logical unit (ALU) 132, a combination of an integer ALU and a branching solution logic 134, a floating point unit 136 with both point addition and subtraction floating as well as single instruction multi-data computational logic (SIMD), and a floating point unit 138 with both floating point multiplication and SIMD computational logic.
[0029] SIMD instructions can perform the same operation on multiple data tracks. SIMD instructions can use a log file separate from an architectural log file used by other instructions in the instruction set architecture (ISA). The records for the SIMD instructions can be used as vectors of elements of the same data type. SIMD instructions can be used to support media acceleration, data processing and graphics. Processor 100 can support both single and double precision floating point operations, both signed and unsigned. One or more of these SIMD instructions may have multiple cycle latency.
[0030] Due to heavy processing and long latency, the execution of specific instructions can consume appreciable amounts of current from the power supply. Consequently, these identical instructions consume significant power. These high-power instructions can be referred to as problem instructions. The emission rate of these problematic instructions can be changed on a clock cycle basis in order to maintain a given level of power consumption on the chip. For example, the power regulator 150 can monitor a problem issuing rate of instructions. When a qualified condition occurs, the power regulator can limit the issue rate of problematic instructions to a programmable reduced rate.
[0031] Referring now to figure 2, a generalized block diagram that illustrates a modality of a power management system 200 for a semiconductor chip is shown. In several modalities, processor 100 can be configured to limit the emission rate of specific instruction types using the power regulation unit 150. The power regulation unit 150 can be controlled by software or hardware . In one embodiment, a specific bit field within a hardware configuration register 270 chosen within processor 100 can be updated with a regulation code by a software layer 260. The power regulation unit can use the regulation code for determining an instruction issuance rate limit for specific types of instruction. Software layer 260 can be a user program, a kernel in an operating system, or other software. In alternative modes, other parameter values, instead of a regulation code, can be used to determine an instruction emission rate limit for specific instruction types. These and other parameters are further described below. In still other embodiments, a power manager 210 may adjust a regulation code or other parameters used by the power regulation unit 150 to adjust the instruction emission rate limit for specific instruction types.
[0032] Processor 100 can be any integrated circuit (IC) in several modalities, processor 100 can include one or more cores each with an instruction on matrix and data cache. Processor 100 may be a superscalar processor with a single thread or multiple threads. In another embodiment, the processor can be an application-specific IC (ASIC). In yet another embodiment, processor 100 may include one or more computational blocks on top of a system on a chip (SOC). Any transistor family can be used to implement processor 100. Examples include metal oxide semiconductor field effect transistors (MOSFETs) and bipolar junction transistors (BJTs).
[0033] In one embodiment, the software layer 260 can update a hardware configuration register 270 within processor 100 with a regulation code. In one embodiment, input / output (I / O) pins on processor 100 can provide access to the hardware configuration register. Alternatively, instructions specific to a user software program or an operating system kernel can be executed by processor 100 hardware and update the configuration record with a specific value. In one example, configuration record 270 is a supervisor-level exception handling record with a reserved miscellaneous bit field. A specified portion of the miscellaneous bit field can be used with care to store the regulation code. Other examples of user-level and supervisor-level configuration records can also be used. In addition, other values can be stored in the hardware configuration register 270. An example of another value may include an instruction issue rate limit. Other examples include an increment value and a decrement value, discussed below, used to maintain an instruction count and generate an instruction issue rate limit. Alternatively, hardware logic can be used to perform these steps as described below.
[0034] In one embodiment, the power manager status machine 220 can be used to collect data from processor 100. While running an application or applications, an estimated real-time power consumption of processor 100 can be transported to the power manager status machine 220. Any of a variety of techniques can be used to determine the power consumption of the processor 100. In one embodiment, current-on-matrix sensors can provide estimates of current consumed for the power manager status table 230. In another embodiment, the data can include binary logical values or numeric weighted values of selected control signals and sampled data. After collecting the data, the power manager status machine 220 can estimate the power consumption for processor 100 and determine the changes to the operating parameters of processor 100.
[0035] In several modalities, a power target can be assigned to processor 100. The power target can, for example, be a thermal design power value. Thermal design power (TDP), which can also be referred to as a thermal design point, represents a maximum amount of power that a cooling system is capable of effectively dissipating for processor 100. If if a high-powered application or virus runs on processor 100, the power manager status machine 220 can make adjustments to the operating voltage, operating frequency, or both. Generally speaking, power consumption is proportional to the operating frequency and operating voltage of the processor 100. In response to receiving updated estimated power data from the processor 100, the power manager status machine 220 can select one power performance state (P state) of several possible P states. The selected state P can be between a state of maximum performance and a state of minimum power. The maximum performance state can correspond to a maximum operating frequency and the minimum power state can correspond to a minimum operating frequency. The intermediate discrete power performance states (P states) between these two states can include data scaled values for a combination of the operating frequency and the operating voltage.
[0036] The status P selected by the power manager status machine 220 can be indicated by a given power status code. This power status code can be used to index the power status table 230. In one embodiment, the power status table 230 includes multiple inputs 240-240g each including multiple fields 242-248. Field 242 may include a power status code. The power status code sent from the power manager status machine can be compared with the value stored in field 242 at each of the 240a-240g inputs. A given input from inputs 240a-240g that has a matching power status code can be used to provide one or more values stored in the other fields 244-248 for processor 100. Field 244 can store an operating frequency associated with a state P indicated by the power state code stored in field 242. Similarly, field 246 can store an operating voltage associated with a state P indicated by the power state code stored in field 242.
[0037] Field 248 can store a power regulation code. The power regulation code can be associated with a specific subset of instruction types that are processed by processor 100. The specific subset of instruction types can be referred to as candidate instruction types. The power regulation code can indicate an instruction emission limit for the candidate types of instruction. In one example, a given power regulation code can indicate an instruction emission limit expressed as a percentage. This percentage can be defined as a maximum number of clock cycles that an instruction of a type of candidate instruction is allowed to be issued within a given number of clock cycles. For example, a 50% percentage may indicate that an instruction of a type of candidate instruction can be issued for execution for a maximum of 1 clock cycle for every 2 clock cycles. A percentage of 66% may indicate that an instruction of a type of candidate instruction can be issued for execution for a maximum of 2 clock cycles for every 3 clock cycles. Other percentages of the emission rate limit are possible and contemplated as discussed below. As the states P change during the execution of applications in processor 100, the power regulation code can also change to either provide more performance or to decrease power consumption.
[0038] A given group of candidate types of instruction may include one or more types of instruction that have been determined or estimated to consume an appreciable power during execution. For example, single instruction multi-data (SIMD) instructions typically contain large data tracks for simultaneous processing of multiple vector elements. In addition, one or more SIMD instructions can have appreciable latency. Instructions with latencies of 8 to 12 clock cycles or more for a significant number of data tracks can consume an appreciable amount of current from the power supply during instruction execution.
[0039] Some examples of high power consumption instruction types include multiplication - SIMD floating point sum (FP), SIMD FP multiplication, SIMD FP sum, SIMD FP square root, SIMD FP reciprocal square, sum SIMD, and so on. Other high power consumption instructions are possible and contemplated. A given group of candidate instruction types may include one or more of the identified instruction types. A given group of candidate instruction types can be larger than a second group and be associated with a stronger instruction emission rate limit when a given power regulation code associated with a high reduction in power consumption. power is selected from the power status table 230.
[0040] Now looking at figure 3, a generalized block diagram illustrating a modality of a power regulation unit 300 is shown. In one embodiment, the power regulation unit 300 is located inside processor 100 and used as the same chaining stage as scheduler 124. The power regulation unit 300 can include a regulation table 310. In one embodiment, regulation table 310 includes multiple inputs 320a-320j each including multiple fields 322-328. Field 322 can store a power regulation code. Regulation table 310 can be indexed by a power regulation code value. In one embodiment, a power regulation code value can be stored in a configuration register, such as a status control register. This specific configuration record can be updated by software, such as a software application written by a designer, the operating system, or another. In another mode, the power regulation code can be sent from the power status table 230 as shown in figure 2. Other mechanisms to maintain a power regulation code are possible and contemplated. A maintained power regulation code can be compared with the value stored in field 322 at each of the inputs 320a-320g. A given input from inputs 320a-320g that has a matching power regulation code can be used to provide the values stored in the other fields 324-328 for regulation logic 340.
[0041] Field 324 in the power regulation table 310 can store one or more identifiers of candidate instruction types associated with the power regulation code stored in field 322. Field 326 can store an associated instruction limit emission count with both the types of candidate instructions stored in field 324 and the power regulation code stored in field 322. Field 326 can alternatively store a limit value for an instruction issuance rate instead of a count issuing instruction. Other measurements that correspond to a processing amount performed by processor 100 to execute the instructions of the identified candidate instruction types can be used. The limit value stored in field 326 can be used by regulation logic 340 to determine when to limit an instruction rate for the identified candidate instruction types. The control logic 340 can limit this instruction rate to a limit value stored in field 328.
[0042] The monitoring unit 330 can maintain an instruction issue count for one or more types of candidate instructions. This count value can be updated with each clock cycle and depends on whether or not an instruction of a given type of candidate instruction is issued to the execution core 130 within processor 100. In one embodiment, an individual count can be maintained for each type of candidate candidate. In another embodiment, a count can be maintained for a group of two or more types of candidate candidates. A given candidate type of instruction can be included in one or more groups of candidate type of instruction.
[0043] In one embodiment, a counter for a given type of candidate instruction can be incremented by one for each clock cycle that an instruction of the given type of candidate instruction is issued to execution core 130. This counter can be Decreased by one for each clock cycle that an instruction of the given type of candidate instruction is not issued to the execution nucleus 130. As further described in this modality, the emission rate limit is 50%. Generally, the emission rate limit can correspond to the ratio of the decrease value to the sum of the increase and decrease values. In other modalities, values other than one can be used for the amounts of increment and decrement to obtain different emission rate limits. Similarly, the increment value can be different from the decrement value. For example, an emission rate limit of 60% can be achieved by setting the increment value to 2 for each clock cycle that an instruction of the given type of candidate instruction is issued to the execution core 130 and adjusting the value decrement to 3 for each clock cycle that an instruction of the given type of candidate instruction is not issued to the execution core 130. In several other modalities, each of the increment and decrement values can be programmable. In a mode, when a value within this counter reaches or exceeds a value stored in field 326 of a selected input from inputs 320a-320j in table 310, regulation logic 340 can start sending the control for scheduler 124 to block the issuing of instructions from the candidate instruction types identified by the identifiers stored in field 324.
[0044] In one example, the SIMD FP Multiply - Add instruction type can be identified by field 324 in a selected input from entries 320a-320j in table 310. A limit count value of 128 can be stored in field 326. The monitoring unit 330 may have a counter that increments and decrements by one as described in the description above. When the monitoring unit 330 detects a count value of 128, the monitoring unit 330 can notify the regulation logic 340. Field 328 of the same input selected in table 310 can store an emission rate limit of 50%. The control logic can send control signals to scheduler 124 to block the issuing of any instructions of the Multiply - Add SIMD FP instruction type in the next clock cycle. The count value can be decreased to 127 as a result of the block. In a subsequent clock cycle, as the count is below the limit, the control signals can be changed by regulation logic 340 to remove any blockage. An instruction of the SIMD FP Multiply - Add instruction type can be issued. The count can be increased back to 128. Again, control signals are sent to scheduler 124 to prevent any instructions from this type of instruction from being issued. Therefore, a maximum emission rate for Multiplying - Summing from SIMD FP is set to 50%.
[0045] In one mode, the amounts of increment and decrement can be changed when the limit value stored in field 326 is reached. Continuing with the example above, if field 328 stores an emission rate limit of 66%, then the amount of decrement can be changed to 2. When the Multiply - Add SIMD FP instruction type instructions are blocked by regulation logic 340, the associated count value is decreased from 128 to 126, instead of 127. Therefore, for two clock cycles, instructions for this type of specific instruction are allowed to be issued before being blocked again by a clock cycle. The maximum issuance rate for the Multiply - Sum of SIMD FP is adjusted to 66%. Other amounts of increase and decrease can be chosen to satisfy other specified emission rate limits. For example, an emission rate limit of 33% can be achieved by changing the amount of increase to two and maintaining the amount of decrease in one.
[0046] In one modality, there may be multiple limit values stored in field 326. In that modality, select one or more types of candidate instructions to limit a respective emission rate and choose an emission rate for each of the selected one or more types of candidate instruction may additionally be based on which instruction issue count exceeded a respective limit stored in field 326.
[0047] Continuing with the above modality, two or more groups of candidate instruction types can be identified in field 324. Two or more limit values can be stored in field 326 associated with the groups identified in field 324. Instead of limiting one issue rate for each of the two or more groups of candidate types of instruction, a given group of the two or more groups can be selected to limit a respective issue rate based on their respective instruction issue count exceeded their limit. Such modality can limit a size of the power regulation code and provide more flexibility in defining the power regulation codes.
[0048] Now looking at figure 4, a generalized flowchart of a 400 method modality for controlling an instruction issuance rate of specific instruction types is shown. Method 400 can be modified by those skilled in the art in order to derive alternative modalities. Also, the steps in this mode are shown in sequential order. However, some steps may occur in a different order than shown, some steps may be performed concurrently, some steps may be combined with other steps, and some steps may be absent in another mode.
[0049] In block 402, given instruction types can be selected as candidate types of instructions for regulation. For example, the types of instructions associated with relatively high power consumption during execution can be selected for regulation. In various modalities, floating point SIMD instructions, load / store instructions, and others can be candidates.
[0050] In block 404, both an increment value, A, and a decrement value, B, can be chosen in order to determine an emission rate limit for the types of candidate candidates. For example, in one modality, the emission rate limit may correspond to the ratio B / (A + B). Therefore, a given limit expressed as a percentage of cycles that issue candidate type instructions for a total number of cycles can be selected. For example, to adjust, or "dial", the power regulation unit and the scheduler to a limit of 60%, an increment value of 2 and a decrement value of 3 can be chosen. These selected values generate a ratio equal to 3 / (2 + 3), or 60%.
[0051] In a modality, the chosen values A and B can be selected by user software that updates a specific hardware configuration record. The power regulation unit can read this specific hardware configuration record. As previously described, the input / output (I / O) pins of processor 100 can provide access to the hardware configuration register. Alternatively, instructions specific to a user software program or an operating system kernel can be executed by processor hardware 100 and updated with specified values of specific bit fields in the configuration register. In yet another modality, the hardware control logic can select the increment and decrement values. In some embodiments, the hardware control logic can select these values based on at least one state value P.
[0052] In block 406 of figure 4, a limit can be selected from a count of candidate type instructions issued. This limit can also be stored in a specific bit field in the hardware configuration log used for the selected increment and decrement values. In block 408, instructions for types not currently blocked are issued from the scheduler to the execution units. If any application type instructions are issued in this clock cycle (conditional block 410), then in block 412, the application type instruction issuance count is increased by the increment value. If this count value exceeds the selected limit (conditional block 414), then in block 416 a block is set to issue the candidate type instructions that correspond to the limit. After that, the control flow of method 400 returns to block 408. On the other hand, if no candidate type instructions are issued in this clock cycle (conditional block 410), then in block 418 the count of issuance of candidate type construction is decremented by the decrement value. In several modalities the count can have a minimum value of zero.
[0053] If the candidate type instructions are blocked (conditional block 420) and this count value is below the selected limit (conditional block 422), then in block 424 the block on the issuing of candidate type instructions can be removed or reset. After that, the control flow of method 400 returns to block 408. Due to the removal of the block, candidate type instructions can be selected for issuance. If candidate type instructions are not blocked (conditional block 420) or if the limit value is above the selected limit (conditional block 422), then the control flow of method 400 returns to block 408.
[0054] The description above for blocks 404 and 406 describe an individual selection of the values of increment, decrement, and limit. Alternatively, specific combinations of these values can be stored in a table. The selection of a specific combination by selecting a specific entry in this table can be performed with the software and hardware mechanisms described above. An implementation of such a table is described below.
[0055] Referring to figure 5, a generalized block diagram of a modality of a regulation table 500 is shown. As shown, the power regulation codes can be stored as binary values. Candidate instruction types can be grouped together based on estimated power consumption for executing corresponding instructions. Latency can be used to distinguish between appreciable differences in power consumption. Here, the types of SIMD FP instructions are grouped by latency. For example, when a power regulation code of 'b000 is selected, only the SIMD FP instruction types with latencies of 8 and 12 clock cycles are chosen for a potential instruction emission regulation. When regulation begins, an emission rate limit of 66% can be used for the types of instructions identified with latencies of 8 and 12 clock cycles.
[0056] In another modality, a limit value used to determine when the instruction emission regulation starts can be a real emission rate instead of an instruction emission count. For example, when scheduler 124 issued more than X instructions for the candidate instruction types identified within Y clock cycles, a movable window of the last Y clock cycles can be maintained that indicates for each of the Y clock cycles. watch whether an associated instruction was issued or not.
[0057] In one embodiment, a shift register with a size of Y bits, where Y is an integer, can be used to maintain the movable window of the last Y clock cycles. An X count can be incremented for each clock cycle that an instruction of a candidate instruction type is issued to the execution core 130. Once Y clock cycles have passed, the X count can also be decreased when during a given clock cycle no instruction of the candidate instruction type is issued to the execution core, where the given clock cycle occurred Y clock cycles before a current clock cycle.
[0058] Referring again to regulation table 310, each of the fields 324-328 for a given power regulation code can be programmable. These values can be changed from their initial values after the program has started running. Similarly, each of the fields within table 500 for a given power regulation code can be programmable. A correlation between the power status code sent from the power management status machine 220 and the power regulation code can be programmable as well. Through software, a user can update specific control records that store the values that correspond to entries within table 310 or table 500.
[0059] Now looking at figure 6, a generalized block diagram of a modality of limiting an instruction issuance rate is shown. As shown, in this example, a count limit of 3 is used. Any count value can be selected. A count limit of 3 is used for ease of illustration. Before reaching the limit, each of an increment and decrement amount has a value of one. An emission rate limit of 50% is selected. Therefore, in this example, the increment and decrement quantities remain at one after the limit is reached.
[0060] In clock cycle (CC) 1, a problematic instruction of identified candidate instruction types is issued to the execution core 130. Therefore, a count is incremented from 0 to 1. Similarly, in CC 2, a problematic instruction is emitted and the count is incremented. At this time, the count is increased to 2. In CC 3, no problematic instructions are selected for transmission possibly because no problematic instructions are available within scheduler 124 or the selection logic within scheduler 124 has selected other instructions for transmission. Consequently, the count is decreased from 2 to 1.
[0061] In each of CC 4 and CC 5, a problematic instruction is issued and the count is increased. In CC 5, the count reaches the limit value of 3. Therefore, regulation logic 340 can send control signals to scheduler 124 to block the issuance of any problematic instructions of the identified candidate instruction types. In CC 6, no problematic instructions are issued for the block and the count is decreased from 3 to 2. The block is removed since the count is below the limit. In CC 7, a problematic instruction is issued and the count is incremented. In CC 7, the count again reaches the limit value of 3. In CC 8 and CC 9, a block and an emission occur and this occurred in CC 6 and CC 7. The emission rate for the problematic instructions reached a maximum limit of an instruction issued by two clock cycles, or 50% of the time.
[0062] Referring to figure 7, a generalized block diagram of yet another modality of limiting an instruction issuance rate is shown. As shown in this example, a count limit of 3 is used again. Before reaching the limit, an increment amount is 1 and a decrement amount has a value of 1. An emission rate limit of 66% is selected. Therefore, in this example, the amount of decrement changes to 2 after the limit is reached.
[0063] In CC 1 through CC 4, problematic instructions are issued to the execution core 130 based on availability and selection logic within scheduler 124. A count is maintained but the regulation logic does not block the issue of the problematic instructions. until the limit value is reached in CC 5. Here, the count is now decreased by 2, instead of 1. In CC 6 through CC 8, the problematic instructions are sent to the execution core 130 while the count is incremented by 1. In, CC 8, the limit is reached again. In CC 9, the problematic instructions are blocked for emission and the count is now decreased by 2, instead of 1. The emission rate for the problematic instructions has reached a maximum limit of two instructions issued by three clock cycles, or 66% of the time.
[0064] Referring now to figure 8, a generalized block diagram of yet another modality of an example of emission rate limitation is shown. As shown in this example, a count limit of 3 is used again. Before reaching the limit, an increment amount is 1 and a decrement amount has a value of 1. A 33% emission rate limit is selected. Therefore, in this example, the amount of increment changes to 2 after the limit is reached.
[0065] In CC 1 to CC 4, problematic instructions are issued to execution core 130 based on availability and selection logic within scheduler 124. A count is maintained, but the regulation logic does not block the issue of problema instructions - matics until the limit value is reached in CC 5. Here, the count is further decreased by 1. In CC 7, a problematic instruction is issued to the execution core 130 since the count is below the limit. However, the count is now increased by 2, instead of 1. The count is changed from 2 to 4. The limit is exceeded so that regulation logic 340 sends control signals to scheduler 124 to block the broadcast problematic instructions. In CC8, the limit is still reached, so that problematic instructions are still blocked. The emission rate for problem instructions reached a maximum limit of one instruction emitted by three clock cycles, or 33% of the time.
[0066] Now looking at figure 9, a generalized flowchart of a 900 method modality for controlling an instruction issuance rate of specific instruction types is shown. The 900 method can be modified by those skilled in the art in order to derive alternative modalities. Also, the steps in this mode are shown in sequential order. However, some steps may occur in a different order than shown, some steps may be performed concurrently, some steps may be combined with other steps, and some steps may be absent in another mode.
[0067] In the mode shown, instruction types can be selected as candidate types of instructions for regulation in block 902. The instruction types associated with high power consumption during execution can be selected. The associated high power consumption may be due to an appreciable amount of processing performed during significant latency. Floating-point SIMD instructions, loading / storing instructions, and more can be candidates. One or more groups of candidate types of instruction can be selected.
[0068] In block 904, a power regulation code can be determined during the execution of one or more software applications. In one embodiment, the power regulation code can be selected based on a power performance state (state P) estimated during the execution of one or more software applications. In another mode, the power regulation code can be read from a status control record written by a user using software. A given value for the power regulation code can be selected in order to select a data from the group of candidate instruction types and an emission rate limit. In a modality, a given limit for an instruction emission count can be selected based on the power regulation code. In an alternative modality, a given limit for an instruction emission rate, instead of the instruction emission count, can be selected based on the power regulation code.
[0069] In block 906, an instruction issuance count for each of the selected groups of one or more types of instruction is maintained during program execution. This count can be increased or decreased by constant quantities determined before the program execution. In an alternative mode, this count can be increased and decreased by amounts read from a table such as a power regulation table 310. In yet another mode, an instruction emission rate can be maintained , rather than a count as previously described.
[0070] If a count or rate maintained exceeds a given limit (conditional block 908), then in block 910, one or more groups of one or more types of instruction of the candidate types of instruction are selected to possibly regulate. The amount of emission regulation can be based on at least the power regulation code. In block 912 an emission rate limit for one or more selected groups is chosen based on at least the power regulation code. In block 914, the instruction issuance rate (s) of the one or more selected groups is / are limited to the associated chosen issuance rate (s). If the power regulation code does not change (conditional block 916), then the control flow of method 900 returns to block 906. Otherwise, in block 918, the limit for the instruction emission count can be updated based on the changed power regulation code. In another modality, the limit for the instruction emission rate may be based on the changed power regulation code. The control flow of method 900 then moves from block 918 to block 906.
[0071] Although the above modalities have been described in considerable detail, numerous variations and modifications will be apparent to those skilled in the art once the above description is fully appreciated. The embodiments are intended to be interpreted covering all such variations and modifications.
权利要求:
Claims (14)
[0001]
1. Processor, characterized by the fact that it comprises: a scheduler (124) configured to select and issue instructions; an execution core (130) configured to receive and execute the instructions issued; and a power regulation unit (150), in which the power regulation unit (150) is configured to: maintain one or more counts of issuing instructions for one or more types of instructions, and maintaining a given control instruction issuance rate comprises: incrementing the given instruction issuance count by a first amount for each clock cycle that an instruction of an associated instruction type is issued to the execution core (130); and decrementing the given instruction emission count by a second quantity for each clock cycle that no instruction of an associated instruction type is issued to the execution core (130); and in response to the determination that a given instruction issue count of one or more instruction counts exceeds a limit: select at least one instruction type from one or more instruction types for a limited instruction issue rate; and choose a new issue rate for at least one type of instruction.
[0002]
2. Processor, according to claim 1, characterized by the fact that the power regulation unit (150) is still configured to execute the selection and choice based on a power regulation code written by software.
[0003]
3. Processor, according to claim 1, characterized by the fact that the power regulation unit (150) is still configured to perform the selection and selection based on an operational power state of the processor.
[0004]
4. Processor, according to claim 3, characterized by the fact that the power regulation unit (150) is still configured to select a respective limit for each one or more types of instructions based on a processor operational power state.
[0005]
5. Processor, according to claim 3, characterized by the fact that the power regulation unit (150) is still configured to perform the selection and the choice based on additionally on which instruction emission count has exceeded respective limit.
[0006]
6. Processor, according to claim 1, characterized by the fact that the first quantity and the second quantity are chosen so that a specific emission rate for instructions of the given type of instruction is achieved.
[0007]
7. Processor, according to claim 6, characterized by the fact that the first quantity and the second quantity are not the same.
[0008]
8. Processor according to claim 6, characterized by the fact that instruction types include single instruction multiple data instructions (SIMD).
[0009]
9. Method characterized by the fact that it comprises the steps of: maintaining one or more instruction emission counts for one or more types of instruction, and maintaining a given instruction emission count comprises: increasing the given emission count instruction by a first quantity for each clock cycle that an instruction of an associated instruction type is issued to the execution core (130); and decrementing the given instruction emission count by a second quantity for each clock cycle that no instruction of an associated instruction type is issued to the execution core (130); determining whether a given instruction emission count of one or more instruction emission counts exceeds a given limit; in response to the determination: select one or more types of instruction to limit a respective issuance rate; choose an issue rate for each of the one or more types of instruction selected; and limiting an associated emission rate for each of the one or more types of instruction selected to a respective chosen emission rate.
[0010]
10. Method, according to claim 9, characterized by the fact that it still comprises executing the selection and the choice based on at least a processor's operational power state and a written power regulation code by software.
[0011]
11. Method, according to claim 10, characterized by the fact that it still comprises selecting a respective limit for each of the one or more types of instruction based on a state of operational power of the processor.
[0012]
12. Method, according to claim 10, characterized by the fact that it still comprises executing the selection and choosing based additionally on which instruction issuance count has exceeded a respective limit.
[0013]
13. Power regulation unit (150), characterized by the fact that it comprises: a first interface for a scheduler (124) configured to select and issue instructions; a second interface for an execution core (130) which is configured to receive and execute the instructions issued; and a regulation control logic, in which the regulation control logic is configured to: maintain one or more instruction issuance rates for one or more types of instruction based on the detection of instructions issued through the second interface, where instruction types are a subset of supported instruction types executed by the execution core (130), and maintaining a given instruction issue count comprises: incrementing the given instruction issue count by a first amount for each clock cycle that an instruction of an associated instruction type is issued to the execution nucleus (130); and decrementing the given instruction emission count by a second quantity for each clock cycle that no instruction of an associated instruction type is issued to the execution core (130); determining whether a given instruction issuance fee of one or more instruction issuance fees exceeds a given limit; in response to the determination: select one or more types of instruction to limit a respective issuance rate; choose an issue rate for each of the one or more types of instruction selected; and sending control signals through the first interface that prevents an associated emission rate for each of the one or more types of instruction selected from exceeding a respective chosen emission rate.
[0014]
14. Power regulation unit (150), according to claim 13, characterized by the fact that the regulation control logic is still configured to execute the selection and choice based on at least one of a code power regulation written by software and a state of operational power of the processor.
类似技术:
公开号 | 公开日 | 专利标题
BR102012024721B1|2021-01-05|processor and method of regulating the issuing of processor instructions
US9383806B2|2016-07-05|Multi-core processor instruction throttling
TWI564707B|2017-01-01|Apparatus,method and system for controlling current
JP5965041B2|2016-08-03|Load store dependency predictor content management
TWI499970B|2015-09-11|Method and apparatus for increasing turbo mode residency of a processor and the processor thereof
US8832485B2|2014-09-09|Method and apparatus for cache control
WO2014204572A1|2014-12-24|Digital power estimator to control processor power consumption
US10001998B2|2018-06-19|Dynamically enabled branch prediction
CN103513964A|2014-01-15|Loop buffer packing
US20180365022A1|2018-12-20|Dynamic offlining and onlining of processor cores
US10289514B2|2019-05-14|Apparatus and method for a user configurable reliability control loop
US9317285B2|2016-04-19|Instruction set architecture mode dependent sub-size access of register with associated status indication
US10114649B2|2018-10-30|Thermal availability based instruction assignment for execution
US9823723B2|2017-11-21|Low-overhead process energy accounting
KR20210043631A|2021-04-21|Control access to branch prediction units for the sequence of fetch groups
US20160055001A1|2016-02-25|Low power instruction buffer for high performance processors
同族专利:
公开号 | 公开日
EP2587366B1|2015-03-04|
CN103092320B|2015-10-14|
TWI517043B|2016-01-11|
AU2012227209B2|2014-01-23|
AU2012227209A1|2013-05-16|
JP2013101605A|2013-05-23|
BR102012024721A2|2013-11-26|
CN103092320A|2013-05-08|
JP5853301B2|2016-02-09|
US20130111191A1|2013-05-02|
US9009451B2|2015-04-14|
TW201331835A|2013-08-01|
KR20130047577A|2013-05-08|
WO2013066519A1|2013-05-10|
EP2587366A1|2013-05-01|
KR101421346B1|2014-07-18|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

EP0651314A1|1993-10-27|1995-05-03|International Business Machines Corporation|An apparatus and method for thermally protecting a processing device|
US5996083A|1995-08-11|1999-11-30|Hewlett-Packard Company|Microprocessor having software controllable power consumption|
CN1157641C|1997-09-03|2004-07-14|松下电器产业株式会社|Processor|
JP3004968B2|1997-09-03|2000-01-31|松下電器産業株式会社|Processor|
US6651176B1|1999-12-08|2003-11-18|Hewlett-Packard Development Company, L.P.|Systems and methods for variable control of power dissipation in a pipelined processor|
US6564328B1|1999-12-23|2003-05-13|Intel Corporation|Microprocessor with digital power throttle|
US6826704B1|2001-03-08|2004-11-30|Advanced Micro Devices, Inc.|Microprocessor employing a performance throttling mechanism for power management|
US6834353B2|2001-10-22|2004-12-21|International Business Machines Corporation|Method and apparatus for reducing power consumption of a processing integrated circuit|
US6775787B2|2002-01-02|2004-08-10|Intel Corporation|Instruction scheduling based on power estimation|
US7246219B2|2003-12-23|2007-07-17|Intel Corporation|Methods and apparatus to control functional blocks within a processor|
US8190863B2|2004-07-02|2012-05-29|Intel Corporation|Apparatus and method for heterogeneous chip multiprocessors via resource allocation and restriction|
US7319578B2|2005-06-08|2008-01-15|International Business Machines Corporation|Digital power monitor and adaptive self-tuning power management|
US7836284B2|2005-06-09|2010-11-16|Qualcomm Incorporated|Microprocessor with automatic selection of processing parallelism mode based on width data of instructions|
TW200825705A|2006-04-26|2008-06-16|Nxp Bv|Method and system for power-state transition controllers|
US7793125B2|2007-01-10|2010-09-07|International Business Machines Corporation|Method and apparatus for power throttling a processor in an information handling system|
US7937568B2|2007-07-11|2011-05-03|International Business Machines Corporation|Adaptive execution cycle control method for enhanced instruction throughput|
US7779237B2|2007-07-11|2010-08-17|International Business Machines Corporation|Adaptive execution frequency control method for enhanced instruction throughput|
US7992017B2|2007-09-11|2011-08-02|Intel Corporation|Methods and apparatuses for reducing step loads of processors|
US20090182986A1|2008-01-16|2009-07-16|Stephen Joseph Schwinn|Processing Unit Incorporating Issue Rate-Based Predictive Thermal Management|
US7937563B2|2008-05-27|2011-05-03|Advanced Micro Devices, Inc.|Voltage droop mitigation through instruction issue throttling|
CN102652297A|2009-12-14|2012-08-29|富士通株式会社|Arithmetic processing device, information processing device, and method for controlling same|US9652018B2|2011-12-30|2017-05-16|Intel Corporation|Adjusting power consumption of a processing element based on types of workloads to be executed|
US9027141B2|2012-04-12|2015-05-05|Netflix, Inc.|Method and system for improving security and reliability in a networked application environment|
US9304573B2|2013-06-21|2016-04-05|Apple Inc.|Dynamic voltage and frequency management based on active processors|
US9292362B2|2013-06-25|2016-03-22|Intel Corporation|Method and apparatus to protect a processor against excessive power usage|
US9396360B2|2013-06-27|2016-07-19|Advanced Micro Devices, Inc.|System and method for secure control over performance state|
WO2015035141A1|2013-09-09|2015-03-12|Seagate Technology Llc|Mobile data storage device with temperature management|
US9547496B2|2013-11-07|2017-01-17|Microsoft Technology Licensing, Llc|Energy efficient multi-modal instruction issue|
US9606605B2|2014-03-07|2017-03-28|Apple Inc.|Dynamic voltage margin recovery|
US9606602B2|2014-06-30|2017-03-28|Intel Corporation|Method and apparatus to prevent voltage droop in a computer|
US10296348B2|2015-02-16|2019-05-21|International Business Machines Corproation|Delayed allocation of an out-of-order queue entry and based on determining that the entry is unavailable, enable deadlock avoidance involving reserving one or more entries in the queue, and disabling deadlock avoidance based on expiration of a predetermined amount of time|
US9841997B2|2015-06-25|2017-12-12|Intel Corporation|Method and apparatus for execution mode selection|
KR20170017382A|2015-08-06|2017-02-15|삼성전자주식회사|Clock management unit, integrated circuit and system on chip adopting the same, and clock managing method|
US9886081B2|2015-09-16|2018-02-06|Qualcomm Incorporated|Managing power-down modes|
CN106775745B|2016-12-28|2020-04-28|广州华多网络科技有限公司|Method and device for merging program codes|
US10423209B2|2017-02-13|2019-09-24|Apple Inc.|Systems and methods for coherent power management|
US10877545B2|2018-09-20|2020-12-29|Arm Limited|Energy management in graphics processing units|
US11061702B1|2020-05-22|2021-07-13|Rapid7, Inc.|Agent-based throttling of command executions|
CN112579396A|2020-12-25|2021-03-30|上海安畅网络科技股份有限公司|Dynamic current limiting method, device and equipment for software system|
法律状态:
2013-11-26| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]|
2014-01-14| B03H| Publication of an application: rectification [chapter 3.8 patent gazette]|
2018-12-11| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-10-08| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-11-10| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-01-05| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 27/09/2012, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US13/285,361|2011-10-31|
US13/285,361|US9009451B2|2011-10-31|2011-10-31|Instruction type issue throttling upon reaching threshold by adjusting counter increment amount for issued cycle and decrement amount for not issued cycle|
[返回顶部]